Provenance Data Storage
نویسندگان
چکیده
Provenance research has generally focused on issues with data collection and organization. Most approaches represent stored provenance data as a directed acyclic graph (DAG), where objects such as files and processes are nodes in the graph and directed edges specify ancestry relationships between them. While there has been some work addressing logical compression of these provenance graphs, efficient physical storage of provenance data remains unaddressed. In approaching this problem, we implemented and evaluated several techniques tailored for provenance storage, which were inspired by existing representations of general semi-structured data. We considered variants of vertical partitioning, PASS, and RDF, varying two kinds of compression. We compared query runtime, disk usage, and data load time across these storage methods. Our results indicate that vertical partitioning performs best in most cases, while the benefit of compression varies by query.
منابع مشابه
A Distributed Provenance Aware Storage System
The provenance of a file represents the origin and history of the file data. A Distributed Provenance Aware Storage System (DPASS) tracks the provenance of files in a distributed file system. The provenance information can be used to identify potential dependencies between files in a filesystem. Some applications of provenance tracking include (i) tracking the transformations applied to process...
متن کاملInferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy
Fine-grained data provenance ensures reproducibility of results in decision making, process control and e-science applications. However, maintaining this provenance is challenging in stream data processing because of its massive storage consumption, especially with large overlapping sliding windows. In this paper, we propose an approach to infer fine-grained data provenance by using a temporal ...
متن کاملProvenance for the Cloud
The cloud is poised to become the next computing environment for both data storage and computation due to its pay-as-you-go and provision-as-you-go models. Cloud storage is already being used to back up desktop user data, host shared scientific data, store web application data, and to serve web pages. Today’s cloud stores, however, are missing an important ingredient: provenance. Provenance is ...
متن کاملA Software Framework for Data Provenance
Data provenance refers to the historical record of the derivation of the data, allowing the reproduction of experiments, interpretation of results and identification of problems through the analysis of the processes that originated the data. Data provenance contributes to the evaluation of experiments. This paper presents a framework for data provenance using the W3C provenance data model, call...
متن کاملProvenance-Aware Storage Systems
A Provenance-Aware Storage System (PASS) is a storage system that automatically collects and maintains provenance or lineage, the complete history or ancestry of an item. We discuss the advantages of treating provenance as meta-data collected and maintained by the storage system, rather than as manual annotations stored in a separately administered database. We describe a PASS implementation, d...
متن کاملPoster: Secure Provenance for Cloud Storage
Organizations are increasingly turning to the cloud for data processing and storage. Storing data in the cloud is advantageous for numerous reasons: the elasticity of cloud environments ensures that only storage used is paid for, while tasks such as backup, replication, and geographic diversification of data are effectively outsourced to cloud storage providers. However, unfettered access to th...
متن کامل